Methods and apparatus for communicating information in a supervised learning system

ABSTRACT

A method and apparatus for communicating accumulated state information between internal and external tasks in a supervised learning system. A supervised learning system encodes state information for a hypothetical learning task on initialization. This hypothetical learning task state information indicates that no training instances have been received. During the supervised learning, training instances are presented to the supervised learner. The training instances are encoded with feature vector and target value information. For each task name paired with a non-default target value, the learner initializes a new learning task by copying the hypothetical learning task state representation for use as the state representation for the new learning task. Predictors are then produced for all learning tasks, except the hypothetical learning task. The new training instance is used to update all learning tasks as specified in the target vector. The new training instance is then used to update the hypothetical learning task state representation as a negative example. Further training instances are handled similarly, new learning tasks are started based on the examination of the sparse target vector for task name, target value pairs which match received training instance target values and for which tasks have not yet been started. The hypothetical state representation information is copied to create the initial state for the new task thereby encapsulating the previous training instances in the new learning tasks state representation.

[0001] This non-provisional application claims the benefit of U.S.Provisional Application No. 60/132,490 entitled “AT&T InformationClassification System” which was filed on May 4, 1999 and U.S. ProvisionApplication No. 60/134,369 entitled “AT&T Information ClassificationSystem” which was filed May 14, 1999, both of which are herebyincorporated by reference in their entirety. The applicants of theProvisional Applications are David D. Lewis, Amitabh Kumar Singhal, andDaniel L. Stem (Attorney Docket No 1999-0220) and David D. Lewis andDaniel L. Stem (Attorney Docket Nos. 1999-0139).

BACKGROUND OF THE INVENTION

[0002] 1. Field of Invention

[0003] This invention relates to the field of machine learning andinformation retrieval. More particularly, the present invention relatesto the problem of communicating accumulated state information betweentasks in a supervised learning system.

[0004] 2. Description of Related Art

[0005] Supervised learning is a well known technique for producingpredictors. A supervised learner inputs a set of training instances andoutputs a predictor. A training instance includes a feature vector and atarget value. The feature vectors represent what is known about thetraining instance while the target values represent an output desiredfrom the predictor given the feature vector as input. The featurevectors and target values can be single data items or complex datastructures.

[0006] A predictor is a rule that the applier uses to produce aprediction from a feature vector. Most examples of predictors aremathematical functions, for example, linear regression models, booleanfunctions and neural networks. However, a predictor can also simply be astored set of training instances, as when the applier performsk-nearest-neighbor classification.

[0007] For a given set of training instances a supervised learnercreates a predictor. The predictor is then used by an applier. Anapplier takes as inputs, a predictor and a feature vector and produces aprediction. This process is referred to as applying the predictor. Theprediction can be a single data item or a complex data structure. Aneffective supervised, learner creates predictors that, when applied tofeature vectors similar to those seen in the training instances, producepredictions similar to the corresponding target values seen in thetraining instances.

[0008] In some instances, a portion of the training instances becomeavailable before other training instances, and it may be desirable tolearn and apply predictors before all training instances becomeavailable. In this case it can be desirable to implement the supervisedlearner as an incremental supervised learner. An incremental supervisedlearner when initialized with a set of training instances will produce apredictor for each learning task. If later given new training instances,it will produce a new predictor for each learning task, taking intoaccount all previously received training instances and the new traininginstances.

[0009] To accomplish this, an incremental supervised learner must retaina state representation which summarizes necessary information aboutpreviously received training instances. When presented with new traininginstances, the incremental supervised learner uses both the summaryinformation about past training instances, plus the new traininginstances, in producing both a new predictor for each learning task anda new state representation.

[0010] Incremental supervised learners use a variety of techniques tostore state representation information. Some incremental supervisedlearners use a state representation which is simply a copy of allpreviously received training instances. Alternatively, an incrementalsupervised learner may use a state representation that attempts toidentify and save only the most important training examples. Still otherincremental supervised learners may use a state representation thatincludes other summary information which may be more compact orefficient. For example, a group of incremental supervised learners knownas online learners can use the set of predictors themselves as the staterepresentation.

[0011] A supervised learner might be used, for example, to producepredictors to assign subject categories to news wire articles. A typicalapproach treats each category as a separate learning task. There wouldbe two possible target values for each learning task: 1) True,indicating that the category should be assigned to the document, and 2)False, indicating that the category should not be assigned to the newswire article. Similarly, the predictor trained for each task might havetwo possible predictions: 1) True, encoding a prediction that thecategory should be assigned to the news wire article, and 2) False,encoding a prediction that the category should not be assigned to thenews wire article.

[0012] To accomplish the training, a person can read selected news wirearticles and manually assign them to categories. The text of those newswire articles can be encoded as a feature vector appropriate for thesupervised learner, and the human category decisions would be encoded asa target vector. The supervised learner would receive training dataconsisting of the appropriate feature vectors and target vectors andproduce one predictor for each category. Those predictors couldsubsequently be used to assign categories to future news wire articles.

[0013] If the supervised learner were an incremental supervised learner,the person could read additional news wire articles at a later point intime and provide new training instances to the incremental supervisedlearner. The incremental supervised learner could produce newpredictors, generally with an improved ability to assign categories.

[0014] A difficulty arises for the incremental supervised learners ifthe new training instances include target values for new learning tasks.In the above example, suppose that the person creates a new category tocover news wire articles about a new topic (e.g., “Kosovo War Stories”).In this example, the incremental supervised learner would receive atraining instance containing a target value for a learning task that ithas not been told to produce predictors for, and would fail to produce apredictor for this new task.

[0015] To date, several solutions have been proposed for this problem.One proposed solution is that when the incremental supervised learner isnotified of a new learning task, the learner modifies its staterepresentation to include this new task and record the fact that zeroprevious training instances have been seen for the new task. Thelearning of the predictor for the new task then begins with the firsttraining instance for which a target value was explicitly encoded forthe new learning task. This technique has the disadvantage that thesupervised learner is not able to make use of the large collection ofpreviously received training examples, which can usually be assumed tohave had default target values for the new task.

[0016] Another proposed technique uses an incremental supervised learnerwhose state representation explicitly contains all previously seentraining instances. When the incremental supervised learner is informedof the new learning task, it modifies its state representation toreflect the assumption that the previously received training instanceshad the default target value for the new training task. In this fashion,both previous received training instances and new training instances canbe used in producing a predictor for the new learning tasks.

[0017] The problem with this second technique is that it requiresaltering the state representation used by the incremental learner,requiring additional complexity in the learning software. Furthermore,explicitly saving all the previous training examples as required by thistechnique may be a less efficient or less effective state representationthan the state representation that might otherwise be used by theincremental learner.

SUMMARY OF THE INVENTION

[0018] The present invention provides a method and apparatus for addingnew learning tasks to an incremental supervised learner. The presentinvention provides a flexible incremental representation of all trainingexamples encountered, thereby permitting state representations for newlearning tasks to take advantage of incremental training alreadycompleted by encoding all past training examples as negative examplesfor a hypothetical learning task. The state representation of thehypothetical learning task may then be copied as the initial staterepresentation for a new learning task to be initiated. The new learningtask would then be initialized with negative training examples of allpreviously presented training examples permitting the learning task toincorporate the previous examples, efficiently. This method andapparatus reduces software complexity and facilitates decomposition ofmachine learning tasks through increased sharing of training instanceinformation across software components.

[0019] These and other advantages of the invention will be apparent tothose of ordinary skill in the art by reference to the followingdetailed description and accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0020] The invention is described in detail with regard to the followingfigures, wherein like numeral references refer to like elements, andwherein:

[0021]FIG. 1 is an exemplary block diagram of a supervised learner inaccordance with the preset invention;

[0022]FIG. 2 is an exemplary flowchart of an incremental supervisedlearner in accordance with the systems and methods of the presentinvention; and

[0023]FIG. 3 is an exemplary representation of the state representationstorage for n training instances and m learning tasks.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0024]FIG. 1 shows a learning system 114 that includes a trainingportion 120 and an operating portion 122. The training portion 120includes an incremental supervised learner 106 connected with a staterepresentation storage 108 and a predictor storage 110. The operatingportion 122 includes an applier 112 and the prediction storage 110. Thestate representation storage 108 and the predictor storage 110 can beimplemented using any appropriate combination of alterable, volatile ornon-volatile memory or non-alterable, or fixed memory. The alterablememory, whether volatile or non-volatile, can be implemented using anyone or more of static or dynamic RAM, a floppy disk and disk drive, awriteable or rewriteable optical disk and disk drive, a hard drive,flash memory or the like.

[0025] Prior to operation, the incremental supervised learner 106 oftraining portion 120 is first initialized with a hypothetical learningtask which initially encodes state representation reflecting that notraining instances have yet been received into the state representationstorage 108.

[0026] Once initialized, the incremental supervised learner 106 receivestraining instances 102 as inputs. The training instances 102 are made upof feature vectors and target values. A feature vector is a collectionof feature values, which can be numeric, boolean etc., such thatcorresponding feature values in different instances encode similarinformation about the instance. For example, a feature value might bethe number of times a particular word occurs in a document, and thefeature vector for the document the set of feature values for each of aset of words. The use of feature vectors to represent instances is wellknown in the art. Feature vectors are discussed in Machine Learning, byTom. M. Mitchell, McGraw-Hill, 1997 which is incorporated by referencein its entirety. The feature vectors represent what is known about thetraining instance 102 while the target value represents the desiredoutput if the feature vector were used as input to an appropriatepredictor. Each training instance may reflect new learning tasks or therefinement of existing learning tasks. For each training instance 102that reflects new learning tasks the state representation 108 of thehypothetical learning task is copied to form the initial staterepresentation of each new learning task.

[0027] The incremental supervised learner 106 can also produce apredictor which is then produced for each new learning task or refine apredictor for each learning task based on the learning task staterepresentation and the current training instance. After the incrementalsupervised learner 106 updates all learning tasks state representations,the incremental supervised learner 106 updates the hypothetical learningtask state representation is updated with the training instance as adefault or negative example. The hypothetical learning task staterepresentation is always updated to reflect each new training instanceas a default or negative example.

[0028] During operation of the systems, the application of thepredictors generated during learning is accomplished by the applier 112of the operating portion 122 which accepts as input feature vectors 104and predictors from the predictor storage 110 and applies the predictorsto produce a prediction 116 as to the appropriate categorization orclassification to be given the input feature vectors 104.

[0029]FIG. 3 is an exemplary embodiment of the state representationstorage 108 of the learning system 114 after n-number of traininginstances have been received. For example, Fields 312-320 show exemplarystate representation of the hypothetical learning task, field 312 aswell as learning task 1 through learning task m. It will be apparentthat any number of learning tasks could be used in the invention withoutdeparting from the spirit and cope of the present invention. The staterepresentation depicted in FIG. 3 is exemplary and not limiting and anytype of state representation storage may be used to practice the presentinvention.

[0030]FIG. 3, col. 312 illustrates the state representation storage 108of the hypothetical learning task of the incremental supervised learner106 after n-number of consecutive training instances. The learning taskstate representation for learning task 1 after n-number of traininginstances have been received by the incremental supervised learner isillustrated in col. 314. The learning task 2 state representation forlearning task 2 after n-number of training instances have been receivedby the incremental supervised learner is shown in col. 316. The learningtask 3 state representation for learning task 3 after n-umber oftraining instances have been received by the incremental supervisedlearner is shown in col. 318. The learning task state representation oflearning task (m) after n-umber training instances is shown in col. 320.

[0031] After the incremental supervised learner, hypothetical learningtask has been initialized as illustrated by row entry 322 showing notraining instances seen by the hypothetical learning task, the firsttraining instance is received as shown at row 324. When the incrementalsupervised learner receives training instance example 1, it generateslearning task 1 which is added to the list of active learning tasks.

[0032] Each learning task 314-320 on the active list of learning tasksis then analyzed with respect to the training instance. First adetermination is made whether the training instance is the firsttraining instance for the learning task

[0033] If the training instance is the first instance for this learningtask then the learning task is a new learning task. A new learning taskstate representation is created by copying the hypothetical learningtask state representation for use as the initial state representationfor the new learning task. For example, the state representation for thehypothetical learning task as shown by col. 312 through row entry 324,is copied and used to initialize the new learning task staterepresentation. Predictors are then produced for the new learning taskbased on the learning task state representation and the current traininginstance. The new learning task state representation is then updatedbased on the existing learning task state representation and the currentinstance.

[0034] If no more learning tasks remain then the hypothetical learningtask state representation is updated with the training instance as anegative example as shown by col. 312, row 324. It should be noted thatpredictors 110 are not produced for the hypothetical learning task.

[0035] Row entry 326 shows a second training instance presented as inputto the incremental supervised learner. This training instance reflects apositive example of refinement to learning task 1, as well as generatingnew learning task 2 as indicated by row 326, cols. 314 and 316.

[0036] If the training instance was not the first training instance forthe task, then predictors are produced for the learning task based onthe learning task state representation and the current instance. Thelearning task state representation is then updated based on the existinglearning task state representation and the current training instance asshown by row entry 326 illustrating the update to learning task 1 staterepresentation as a result of the training instance example 2.

[0037] Since the training instance was also a first training instancefor learning task 2, an initial state representation is created bycopying the hypothetical learning task state representation as shown inrows 324-326, col. 316.

[0038] In row 328 training instance example 3 is shown. This traininginstance adds learning task 3 to the list of learning tasks. Then it isdetermined that training instance 3 does not reflect a positive trainingexample for learning task 1 as indicated at row 328, col. 314. Thetraining instance does reflect a positive training example for learningtask 2 as indicated by row 328, col. 316. Thus, a predictor 110 isproduced based on the existing state representation, as shown in row324-326, col. 316 and the training instance.

[0039] Similarly, training instance example 3 reflects a positivetraining example for newly created learning task 3. Since the traininginstance example 3 is the first instance for newly created learning task3, a new state representation for learning task 3 is created by copyingthe current hypothetical learning task state representation, as shown inrows 324-326, col. 312, to initialize the state representations forlearning task 3 as shown by rows 324-326, col. 318. Predictors are thenproduced based on the state representations for learning task 3 and thecurrent training instance. It should therefore be apparent that eachtraining instance may serve to update more than one learning task.

[0040] In row 334, training instance example (n) is received. Thistraining instance reflects the refinement of learning task 1 as well asthe creation of new learning task m.

[0041] For learning task 1, predictors 110 are produced based on theexisting state representation reflected by col. 314, rows 324-332 andthe new training instance example n. For learning task m, a new staterepresentation is initialized with the state representation from thehypothetical learning task as represented by rows 324-332, col. 312. Apredictor 110 for learning task m is then created based on the staterepresentation task m and the current training instance example n. Thisstate representation is depicted in rows 324-332 of col. 320. A newstate representation for learning task (m) as represented by rows324-334, col. 320, and training instance example (n). At this pointthere are no further learning tasks to be updated. The staterepresentation 312 for the hypothetical learning task is then updatedwith the training instance example (n) serving as a negative example asindicated by rows 324-334, col. 312.

[0042]FIG. 2 is a flowchart illustrating an exemplary process of thepresent invention. The incremental supervised learner uses ahypothetical learning task which maintains a corresponding staterepresentation to encode all training instances as negative trainingexamples. This hypothetical task state representation is used by theincremental supervised learner to efficiently accumulate and transferknowledge about training instances already encountered to each newlearning task. The incremental supervised learner starts at step 200,The state representation for the hypothetical learning task encodes alltraining instances as negative examples. The process starts at step 200,control is transferred to step 205 where a state representation of thehypothetical learning task is created that reflects that no traininginstances have been seen by the incremental supervised learner. Controlis passed to step 210 where an empty list of training tasks are createdand control passes to step 215 where the training instance is received.

[0043] Control then proceeds to step 220 where all the learning tasksthat have a non-default target value for this training instance areadded to the list of learning tasks. Control is then transferred to step225 where the first learning task in the list of learning tasks isretrieved. Control is then transferred to decision point step 230.

[0044] At step 230 a determination is made as to whether the currenttraining instance is the first training instance associated with thislearning task. If this training instance is not the first traininginstance for the current learning task, control is transferred to step240. Otherwise, if this training instance is the first training instancefor the current learning task, then control is transferred to step 235where the hypothetical learning task state representation is copied toform the initial state representation for the new learning task. Controlis then transferred to step 240.

[0045] In step 240, predictors are produced for the learning task basedon the state representation for the learning task and the currenttraining instance. Control is transferred to step 245.

[0046] In step 245, a new state representation for the learning task isproduced based on the state representation for the learning task and thecurrent training instance. Control is then transferred to decision pointstep 250.

[0047] In step 250, a determination is made whether any more learningtasks remain. If more learning tasks remain to be processed, controlthen returns to step 225 and the process is repeated for each remaininglearning task. If no further learning tasks remain to be processed, thencontrol proceeds to step 255.

[0048] At step 255, the hypothetical learning task state representationis updated treating the current training instance as having a defaulttarget value for the hypothetical learning task. Control is thentransferred to step 260.

[0049] In step 260, a determination is made whether any traininginstances remain to be processed. If further instances exist, control isthen transferred to step 215 and the process repeats for each remainingtraining instance. If no further training instances remain, control isthen transferred to step 270 where the process ends.

[0050] As shown in FIG. 1, the method of this invention is preferablyimplemented on a programmed general purpose computer. However, theinvention can also be implemented on a special purpose computer, aprogrammed processor or micro controller and peripheral integratedcircuit elements; an application specific integrated circuit (ASIC), orother integrated circuit; a digital signal processor, a hardwiredelectronic or logic circuit, such as a discrete element circuit; aprogrammable logic device, such as a PLD, PLA, FPGA or PAL, or the like.In general, any device capable of implementing a finite state machinethat is in turn capable of implementing the flowchart shown in FIG. 2,can be used to practice the invention described above.

[0051] While the invention has been described in the conjunction withthe specific embodiments thereof, it is evident that many alternatives,modifications, and variations will be apparent to those skilled in theart Accordingly, preferred embodiments of the invention as set forthherein are intended to be illustrative, not limiting. Various changesmay be made without departing from the spirit and scope of the inventionas described in the following claims.

What is claimed is:
 1. A method for communicating accumulated stateinformation between tasks in a learning system, comprising: encodinginitial state representation for a hypothetical learning task indicatingthat no training instances have been received; receiving a traininginstance; if the training instance received reflects a new learningtask, initializing a new learning task state representation based on thehypothetical learning task state representation; updating each learningtask state representation except the hypothetical learning task using atarget value stored for that task in the training instance; and updatingthe state representation for the hypothetical learning task using adefault target value for the training instance.
 2. The method of claim1, further including producing predictors for each learning task basedon each learning task state representation.
 3. The method of claim 1,wherein default target values reflect negative examples.
 4. The methodof claim 2, further including an applier that produces a predictionbased on the predictor.
 5. The method of claim 2 wherein the predictorsare at least one of boolean functions, regression models and neuralnetworks.
 6. The method of claim 2 where the predictors are used byanother learning system.
 7. The method of claim 1 where the learningsystem is an incremental supervised learning system.
 8. A system forcommunicating accumulated state information between tasks in a leaningsystem, comprising: an incremental learner that receives traininginstances; a hypothetical learning task state representation storagethat is initialized to indicate no training instance have been receivedand that is updated with the default target value for each new traininginstance; a state representation storage that stores an initialized newlearning task state representation based on the hypothetical learningtask state representation and that stores updated state representationfor each learning task based on the target value for the receivedtraining instance and that updates the hypothetical learning task with adefault target value for each received training instance.
 9. The systemof claim 8, further comprising a predictor storage which encodes apredictor based on each learning task state representation.
 10. Thesystem of claim 8, wherein the default target values reflect negativeexamples.
 11. The system of claim 9, further comprising an applier thatproduces a prediction based on the predictor.
 12. The system of claim 9wherein the predictor storage encodes at least one of boolean functions,regression models and neural networks.
 13. The system of claim 9 whereinthe predictor storage is used by another learning system.
 14. The systemof claim 8 where the learning system is an incremental supervisedlearning system.